question retrieval
QEQR: An Exploration of Query Expansion Methods for Question Retrieval in CQA Services
Ghafourian, Yasin, Movahedi, Sajad, Shakery, Azadeh
CQA services are valuable sources of knowledge that can be used to find answers to users' information needs. In these services, question retrieval aims to help users with their information needs by finding similar questions to theirs. However, finding similar questions is obstructed by the lexical gap that exists between relevant questions. In this work, we target this problem by using query expansion methods. We use word-similarity-based methods, propose a question-similarity-based method and selective expansion of these methods to expand a question that's been submitted and mitigate the lexical gap problem. Our best method achieves a significant relative improvement of 1.8\% compared to the best-performing baseline without query expansion.
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval
Wieting, John, Clark, Jonathan H., Cohen, William W., Neubig, Graham, Berg-Kirkpatrick, Taylor
Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in $N$ languages and, through an approximation we introduce, efficiently encourages source separation in this multilingual setting, separating semantic information that is shared between translations from stylistic or language-specific variation. We show careful large-scale comparisons between contrastive and generation-based approaches for learning multilingual text embeddings, a comparison that has not been done to the best of our knowledge despite the popularity of these approaches. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval -- the last of which we introduce in this paper. Overall, our Variational Multilingual Source-Separation Transformer (VMSST) model outperforms both a strong contrastive and generative baseline on these tasks.
An Unsupervised Model With Attention Autoencoders for Question Retrieval
Zhang, Minghua (Peking University) | Wu, Yunfang (Peking University,ย Instituteย ofย Computationalย Linguistics)
Question retrieval is a crucial subtask for community question answering. Previous research focus on supervised models which depend heavily on training data and manual feature engineering. In this paper, we propose a novel unsupervised framework, namely reduced attentive matching network (RAMN), to compute semantic matching between two questions. Our RAMN integrates together the deep semantic representations, the shallow lexical mismatching information and the initial rank produced by an external search engine. For the first time, we propose attention autoencoders to generate semantic representations of questions. In addition, we employ lexical mismatching to capture surface matching between two questions, which is derived from the importance of each word in a question. We conduct experiments on the open CQA datasets of SemEval-2016 and SemEval-2017. The experimental results show that our unsupervised model obtains comparable performance with the state-of-the-art supervised methods in SemEval-2016 Task 3, and outperforms the best system in SemEval-2017 Task 3 by a wide margin.
Convolutional Neural Tensor Network Architecture for Community-Based Question Answering
Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)
Retrieving similar questions is very important in community-based question answering. A major challenge is the lexical gap in sentence matching. In this paper, we propose a convolutional neural tensor network architecture to encode the sentences in semantic space and model their interactions with a tensor layer. Our model integrates sentence modeling and semantic matching into a single model, which can not only capture the useful information with convolutional and pooling layers, but also learn the matching metrics between the question and its answer. Besides, our model is a general architecture, with no need for the other knowledge such as lexical or syntactic analysis. The experimental results shows that our method outperforms the other methods on two matching tasks.
Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval
Zhang, Wei-Nan (Harbin Institute of Technology) | Ming, Zhao-Yan (Digipen Institute of Technology) | Zhang, Yu (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Chua, Tat-Seng (National University of Singapore)
Question retrieval in current community-based question answering (CQA) services does not, in general, work well for long and complex queries. One of the main difficulties lies in the word mismatch between queries and candidate questions. Existing solutions try to expand the queries at word level, but they usually fail to consider concept level enrichment. In this paper, we explore a pivot language translation based approach to derive the paraphrases of key concepts. We further propose a unified question retrieval model which integrates the keyconcepts and their paraphrases for the query question. Experimental results demonstrate that the paraphrase enhanced retrieval model significantly outperforms the state-of-the-art models in question retrieval.
Mining Query Subtopics from Questions in Community Question Answering
Wu, Yu (Beihang University) | Wu, Wei (Microsoft Reasearch Asia) | Li, Zhoujun (Beihang University) | Zhou, Ming (Microsoft Reasearch Asia)
This paper proposes mining query subtopics from questions in community question answering (CQA). The subtopics are represented as a number of clusters of questions with keywords summarizing the clusters. The task is unique in that the subtopics from questions can not only facilitate user browsing in CQA search, but also describe aspects of queries from a question-answering perspective. The challenges of the task include how to group semantically similar questions and how to find keywords capable of summarizing the clusters. We formulate the subtopic mining task as a non-negative matrix factorization (NMF) problem and further extend the model of NMF to incorporate question similarity estimated from metadata of CQA into learning. Compared with existing methods, our method can jointly optimize question clustering and keyword extraction and encourage the former task to enhance the latter. Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-of-the-art methods for question clustering.
Improving Question Retrieval in Community Question Answering Using World Knowledge
Zhou, Guangyou (Chinese Academy of Sciences) | Liu, Yang (Chinese Academy of Sciences) | Liu, Fang (Chinese Academy of Sciences) | Zeng, Daojian (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Chinese Academy of Sciences)
Community question answering (cQA), which providesa platform for people with diverse backgroundto share information and knowledge, hasbecome an increasingly popular research topic. Inthis paper, we focus on the task of question retrieval.The key problem of question retrieval is tomeasure the similarity between the queried questionsand the historical questions which have beensolved by other users. The traditional methodsmeasure the similarity based on the bag-of-words(BOWs) representation. This representation neithercaptures dependencies between related words, norhandles synonyms or polysemous words. In thiswork, we first propose a way to build a conceptthesaurus based on the semantic relations extractedfrom the world knowledge of Wikipedia. Then, wedevelop a unified framework to leverage these semanticrelations in order to enhance the questionsimilarity in the concept space. Experiments conductedon a real cQA data set show that with thehelp of Wikipedia thesaurus, the performance ofquestion retrieval is improved as compared to thetraditional methods.